AITopics | informative data

Collaborating Authors

informative data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An adaptive data sampling strategy for stabilizing dynamical systems via controller inference

Werner, Steffen W. R., Peherstorfer, Benjamin

arXiv.org Artificial IntelligenceJun-3-2025

Learning stabilizing controllers from data is an important task in engineering applications; however, collecting informative data is challenging because unstable systems often lead to rapidly growing or erratic trajectories. In this work, we propose an adaptive sampling scheme that generates data while simultaneously stabilizing the system to avoid instabilities during the data collection. Under mild assumptions, the approach provably generates data sets that are informative for stabilization and have minimal size. The numerical experiments demonstrate that controller inference with the novel adaptive sampling approach learns controllers with up to one order of magnitude fewer data samples than unguided data generation. The results show that the proposed approach opens the door to stabilizing systems in edge cases and limit states where instabilities often occur and data collection is inherently difficult.

artificial intelligence, controller, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2506.01816

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science (0.67)

Add feedback

Avoid Wasted Annotation Costs in Open-set Active Learning with Pre-trained Vision-Language Model

Heo, Jaehyuk, Kang, Pilsung

arXiv.org Artificial IntelligenceAug-9-2024

Active learning (AL) aims to enhance model performance by selectively collecting highly informative data, thereby minimizing annotation costs. However, in practical scenarios, unlabeled data may contain out-of-distribution (OOD) samples, leading to wasted annotation costs if data is incorrectly selected. Recent research has explored methods to apply AL to open-set data, but these methods often require or incur unavoidable cost losses to minimize them. To address these challenges, we propose a novel selection strategy, CLIPN for AL (CLIPNAL), which minimizes cost losses without requiring OOD samples. CLIPNAL sequentially evaluates the purity and informativeness of data. First, it utilizes a pre-trained vision-language model to detect and exclude OOD data by leveraging linguistic and visual information of in-distribution (ID) data without additional training. Second, it selects highly informative data from the remaining ID data, and then the selected samples are annotated by human experts. Experimental results on datasets with various open-set conditions demonstrate that CLIPNAL achieves the lowest cost loss and highest performance across all scenarios. Code is available at https://github.com/DSBA-Lab/OpenAL.

informativeness, ood data, query strategy, (17 more...)

arXiv.org Artificial Intelligence

2408.04917

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Releasing Malevolence from Benevolence: The Menace of Benign Data on Machine Unlearning

Ma, Binhao, Zheng, Tianhang, Hu, Hongsheng, Wang, Di, Wang, Shuo, Ba, Zhongjie, Qin, Zhan, Ren, Kui

arXiv.org Artificial IntelligenceJul-6-2024

Machine learning models trained on vast amounts of real or synthetic data often achieve outstanding predictive performance across various domains. However, this utility comes with increasing concerns about privacy, as the training data may include sensitive information. To address these concerns, machine unlearning has been proposed to erase specific data samples from models. While some unlearning techniques efficiently remove data at low costs, recent research highlights vulnerabilities where malicious users could request unlearning on manipulated data to compromise the model. Despite these attacks' effectiveness, perturbed data differs from original training data, failing hash verification. Existing attacks on machine unlearning also suffer from practical limitations and require substantial additional knowledge and resources. To fill the gaps in current unlearning attacks, we introduce the Unlearning Usability Attack. This model-agnostic, unlearning-agnostic, and budget-friendly attack distills data distribution information into a small set of benign data. These data are identified as benign by automatic poisoning detection tools due to their positive impact on model training. While benign for machine learning, unlearning these data significantly degrades model information. Our evaluation demonstrates that unlearning this benign data, comprising no more than 1% of the total training data, can reduce model accuracy by up to 50%. Furthermore, our findings show that well-prepared benign data poses challenges for recent unlearning techniques, as erasing these synthetic instances demands higher resources than regular data. These insights underscore the need for future research to reconsider "data poisoning" in the context of machine unlearning.

dataset, informative benign data, informative data, (12 more...)

arXiv.org Artificial Intelligence

2407.05112

Country:

North America > United States > California (0.04)
Asia > Nepal (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Physics-Enhanced Machine Learning: a position paper for dynamical systems investigations

Cicirello, Alice

arXiv.org Artificial IntelligenceJun-8-2024

This position paper takes a broad look at Physics-Enhanced Machine Learning (PEML) -- also known as Scientific Machine Learning -- with particular focus to those PEML strategies developed to tackle dynamical systems' challenges. The need to go beyond Machine Learning (ML) strategies is driven by: (i) limited volume of informative data, (ii) avoiding accurate-but-wrong predictions; (iii) dealing with uncertainties; (iv) providing Explainable and Interpretable inferences. A general definition of PEML is provided by considering four physics and domain knowledge biases, and three broad groups of PEML approaches are discussed: physics-guided, physics-encoded and physics-informed. The advantages and challenges in developing PEML strategies for guiding high-consequence decision making in engineering applications involving complex dynamical systems, are presented.

dynamical system, machine learning, physics-based model, (14 more...)

arXiv.org Artificial Intelligence

2405.05987

Country:

North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Energy (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Active Learning: Learning with Limited Labeled Data in Python (Scikit-learn, Active Learning Lib) - Code Armada, LLC

#artificialintelligenceApr-11-2023, 11:35:28 GMT

Active Learning: Learning with Limited Labeled Data in Python (Scikit-learn, Active Learning Lib) Active Learning is a machine learning approach that enables the selection of the most informative data points to be labeled by an oracle, thereby reducing the number of labeled data points required to train a model. Active Learning is useful in scenarios where labeled data is limited or expensive to acquire. Active Learning can help improve the accuracy of machine learning models with fewer labeled data points. Learning with Limited Labeled Data in Python Python is a popular language for machine learning, and several libraries support Active Learning. In this tutorial, we will use the Scikit-learn library to train a model and the Active Learning library to select informative data points to be labeled. Import Libraries We will start by importing the necessary libraries, including Scikit-learn for training the model, NumPy for numerical computations, and the Active Learning library for selecting informative data points to be labeled. import numpy as np from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from modAL.uncertainty import uncertainty_sampling Generate Data Next, we will generate some random data for training and testing the model. # Generate random data for […]

active learning, artificial intelligence, machine learning, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.52)

Add feedback

Online Symbolic Regression with Informative Query

Jin, Pengwei, Huang, Di, Zhang, Rui, Hu, Xing, Nan, Ziyuan, Du, Zidong, Guo, Qi, Chen, Yunji

arXiv.org Artificial IntelligenceFeb-21-2023

Symbolic regression, the task of extracting mathematical expressions from the observed data $\{ \vx_i, y_i \}$, plays a crucial role in scientific discovery. Despite the promising performance of existing methods, most of them conduct symbolic regression in an \textit{offline} setting. That is, they treat the observed data points as given ones that are simply sampled from uniform distributions without exploring the expressive potential of data. However, for real-world scientific problems, the data used for symbolic regression are usually actively obtained by doing experiments, which is an \textit{online} setting. Thus, how to obtain informative data that can facilitate the symbolic regression process is an important problem that remains challenging. In this paper, we propose QUOSR, a \textbf{qu}ery-based framework for \textbf{o}nline \textbf{s}ymbolic \textbf{r}egression that can automatically obtain informative data in an iterative manner. Specifically, at each step, QUOSR receives historical data points, generates new $\vx$, and then queries the symbolic expression to get the corresponding $y$, where the $(\vx, y)$ serves as new data points. This process repeats until the maximum number of query steps is reached. To make the generated data points informative, we implement the framework with a neural network and train it by maximizing the mutual information between generated data points and the target expression. Through comprehensive experiments, we show that QUOSR can facilitate modern symbolic regression methods by generating informative data.

artificial intelligence, expression, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2302.10539

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report (0.50)
Workflow (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Localized active learning of Gaussian process state space models

Capone, Alexandre, Umlauft, Jonas, Beckers, Thomas, Lederer, Armin, Hirche, Sandra

arXiv.org Machine LearningJun-9-2020

The performance of learning-based control techniques crucially depends on how effectively the system is explored. While most exploration techniques aim to achieve a globally accurate model, such approaches are generally unsuited for systems with unbounded state spaces. Furthermore, a globally accurate model is not required to achieve good performance in many common control applications, e.g., local stabilization tasks. In this paper, we propose an active learning strategy for Gaussian process state space models that aims to obtain an accurate model on a bounded subset of the state-action space. Our approach aims to maximize the mutual information of the exploration trajectories with respect to a discretization of the region of interest. By employing model predictive control, the proposed technique integrates information collected during exploration and adaptively improves its exploration strategy. To enable computational tractability, we decouple the choice of most informative data points from the model predictive control optimization step. This yields two optimization problems that can be solved in parallel. We apply the proposed method to explore the state space of various dynamical systems and compare our approach to a commonly used entropy-based exploration strategy. In all experiments, our method yields a better model within the region of interest than the entropy-based method.

gaussian process, optimization problem, upstream oil & gas, (19 more...)

arXiv.org Machine Learning

2005.02191

Country:

Europe > Germany (0.14)
North America > United States (0.14)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback